Objectives

  • Produce barplots using ggplot.
  • Set universal plot settings.
  • Describe what faceting is and apply faceting in ggplot.
  • Modify the aesthetics of an existing ggplot plot (including axis labels and colour).
  • Build complex and customized plots from data in a data frame.

Understanding how homicide rates have changed prior to the modern era requires the help of historians and archivists. Manuel Eisner, a criminology professor at the University of Cambridge, and his colleagues published the Historical Violence Database a compilation of data on long-term trends in homicide rates, in addition to qualitative information such as the cause of death, perpetrator and victim. This database is limited to countries with relatively complete historical records on violence and crime – mainly Western Europe and the US. We will use here a version of their dataset provided by OurWorldInData project based at the Oxford University.

Starting in the second half of the nineteenth century, Western European regions have consistent police records of those accused of murder or manslaughter and annual counts of homicide victims. To go back further in time, reaching as far back as the thirteenth century, Eisner collected estimates (from historical records of coroner reports, court trials, and the police) of homicide rates made in over ninety publications by scholars.

Homicide rates – measured as the number of homicides per 100,000 individuals – up to 1990 are sourced from Eisner’s (2003) publication and the Historical Violence Database.

Questions

Your task is to assess whether homicide rates in Europe today are lower or higher than in the past? Use the provided dataset to explore, display and describe the long-run homicide rates for the five European regions: Italy, England, Germany, Netherlands and Scandinavia.

Get the library

library(tidyverse)

Load the data

You should always interrogate the source of your data and ask who compiled it, on the basis of what, what is missing, how representative the data are? You can consult the OurWorldInData project as well as Eisner’s publications for initial insights.

# download the dataset
download.file("https://raw.githubusercontent.com/adivea/r-history/main/episodes/data/homicide-rates-across-western-europe.csv", destfile = "data/homicide-rates-across-western-europe.csv")

# load the data into R
Western_Europe <- read_csv("data/homicide-rates-across-western-europe.csv")

Inspect the data

How clean and analysis-ready is the dataset? Do you understand what the column names represent? What is hiding under “Entity”? What is the difference between rate and homicide number?

head(Western_Europe)
# A tibble: 6 × 4
  Entity  Code   Year `Homicide rate in Europe over long-term (per 100,000)`
  <chr>   <chr> <dbl>                                                  <dbl>
1 England <NA>   1300                                                     23
2 England <NA>   1550                                                      7
3 England <NA>   1625                                                      6
4 England <NA>   1675                                                      4
5 England <NA>   1725                                                      2
6 England <NA>   1775                                                      1

Ok, the data look good except for the column Homicide rate in Europe over long-term (per 100,000) which is very long and not very easy to work with.

wrangle and visualise the data

# YOUR CODE
names(Western_Europe)[4] <- "homicides_per_100k"

Now, that you have looked at what the data looks like and what it represents, and streamlined it, let’s see what big picture it contains.

Plot the long-term trend of homicides

ggplot(data = Western_Europe) + 
  #....YOUR CODE GOES HERE
ggplot(data = Western_Europe) + 
  geom_line(mapping = aes(x = Year, 
                           y = homicides_per_100k,
            color = Entity)) +
  labs(x = "Year",
       y = "Number of Homicides per 100,000 people",
       title = "Homicide rate in Europe from 1300-2000")

Alright, the homicide rates should all be descending over time. What a comfort. But the viz is not super clear. Let’s check the rates for individual countries.

Uncouple the homicides of individual countries for easier view

You can visualize each country’s trend in a separate plot by adding an extra argument to the ggplot, the facet_wrap() and feeding it the country column. If in doubt, check your ggplot tutorial and your country column name for exact usage.

ggplot(data = Western_Europe) + 
  #... YOUR CODE
ggplot(data = Western_Europe) + 
  geom_line(mapping = aes(x = Year, 
                           y = homicides_per_100k,
            color = Entity)) +
  facet_wrap( ~ Entity, ncol = 2) +
  labs(x = "Year",
       y = "Number of Homicides per 100,000 people",
       title = "Homicide rate in Europe from 1300-2000") 

Finetune the facetted ggplot

  1. In the faceted plot above, move the legend from the current position on the side to below the facets, and label it “Country” instead of “Entity”. For the former, explore the theme(), and for the latter, try googling. Knowing how to ask a question to zoom down on the problem is a skill that requires practice.
ggplot(data = Western_Europe) + 
  geom_line(mapping = aes(x = Year, 
                           y = homicides_per_100k,
            color = Entity)) +
  facet_wrap( ~ Entity, ncol = 2) +
  labs(x = "Year",
       y = "Number of Homicides per 100,000 people",
       title = "Homicide rate in Europe from 1300-2000",
       color = "Country") +
  theme(legend.position = "bottom")

Learn to edit this rmarkdown

For this task, download the rmarkdown script that generated this lesson. Its extension is .Rmd and it is a flexible type of document that allows you to seamlessly combine executable R code, and its output, with text in a single document. It can look neat and be useful for presenting one’s research as well as creating assignments for students. If you want to learn more about the format, consult episode 06 among the training guides. Once you have the original script, start in the top section, the yaml header and then move down.

Are we more civilized today?

Finally, enjoy your accomplishments and ponder the main question behind this data: are we more civilized today?

Compare the trends in homicide with the pattern of reign duration among Danish rulers through time. How would you characterize the relationship between the two timeseries?

Well done!